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In  tnia  paper  we  investigate  the  robustness  of  several  ieadlock 
detection  algorithms  for  distributed  computing  systems.  \!e  analyte 
the  behavior  of  each  algorithm  in  the  oreser.oe  of  two  classes  of 
failures  -  lost  messages  and  single  site  failures.  In  the  case  of 
single  site  failure  we  consider  six  different  types  of  sites  depending 
on  how  they  can  participate  in  .deadlock  and  deadlock  detection.  The 
observation  and  conclusions  made  in  this  paper  are  intended  to  show 
how  robust  the  present  algorithms  'are  and  to  provide  an  insight  and 
better  understanding  of  distributed  algorithms  robustness. 
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There  have  been  many  algorithms  published  for  deadlock  detection, 
prevention  or  avoidance  in  centralized  multiprogramming  systems.  The 
problem  of  deadlock  in  those  systems  has  been  essentially  solved.  In 
the  past  decade  there  has  been  considerable  work  done  on  distributed 
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predecessors  of  distributed  commuting  systems  which  are  presently  a 
focus  of  intensive  research  and  development  ir.  academia  and  industry. 
Many  techniques  for  concurrency  control,  reliability/recovery  or  secu¬ 
rity  developed  for  centralized  (or  single  C?rJ)  systems  have  been  or 
are  being  adopted  and  adapted  for  distributed  computing  systems.  ?or 
example,  there  is  a  tendency  to  use  locking  as  a  general  synchroniza¬ 
tion  technique  in  distributed  systems  and  its  special  variant,  two- 
phase  locking,  for  distributed  database  systems.  Tp  'until  recently  it 
has  been  argued  that  the  frequency  of  deadlock  occurence  in  existing 
applications  is  so  low  that  the  problem  of  deadlock  in  distributed 
systems  is  not  very  important  and  therefore  can  be  managed  by  adopting 
techniques  developed  for  centralized  systems.  However,  it  has  become 
recently  apparent  that  deadlocks  may  be  a  problem  in  the  future  as  we 
see  new  applications  featuring  large  processes  and/or  many  concurrent 
processes  or  transactions[lRAS1 ].  As  an  example  of  such  new  appli ca¬ 
mions  we  mention  information  utility  systems  which  service  concurrent¬ 
ly  hundreds  or  perhaps  thousands  of  TV  'users. 

The  distributed  computing  systems  are  characterized  by  the  ab¬ 
sence  of  global  memory  and  by  message  transmission  delays  which  are 
not  negligible.  Additionally,  the  processes  operating  at  the  same  or 
different  sites  can  communicate  'with  each  ether,  and  can  share 
resources .  If  locking  is  used  as  the  synchronization  “eohr.icue,  “hen 


"the  last  two  items  raise  the  problems  of  ieadlook  occurence  in  distri¬ 
buted  systems,  and  the  first  two  characteristics  of  distributed  sys¬ 
tems  .take  it  much  tore  difficult  to  detect,  avoid  or  prevent  than  in 
the  earlier  multiprogramming  centralized  computing  systems. 

Deadlock  prevention  and  avoidance  algorithms  for  a  distributed 
computing  systems  are  not  efficient.  Prevention  can  be  accomplished 
cy  net  *0  7  pg  concurrent  crocsssin^*,  by  essi^nin«  rricrities  ur.d 
•allowing  preemption,  by  r etui ring  a  process  to  acquire  all  resources 
it  will  need  before  it  starts,  or  by  having  no  looks,  -.squiring 
sequential  execution  in  a  distributed  system  is  a  =ross  waste  of 
resources.  Having  prioritized  processes  will  result  in  lower- 
prioritied  processes  being  restarted  many  times,  with  a  major  degrada¬ 
tion  in  system  efficiency.  Dynamic  prioritization  would  be  a  complex 
algorithm  by  itself.  A  process  may  be  ’unable  to  determine  its  minimum 
set  of  resources,  and  therefore  would  have  to  acquire  the  set  of  all 
probable  and  possible  resources,  even  thouai  it  may  not  need  them.  In 
addition,  in  systems  in  which  messages  are  treated  as  resources,  it  is 
impossible  to  determine  in  advance  which  messages  will  be  required. 
Having  no  lock3  may  result  in  database  inconsistencies,  assuming  a 
non-optimi3tic  concurrency  controller.  Similarly,  deadlock  avoidance 
algorithms,  which  either  calculate  a  'safe  path’  pd-017'7]  or  never  wait 
for  a  lock'_DRA?S]  are  also  inefficient.  Safe  path  algorithms  require  a 
non— trivial  execution  time,  and  must  be  done  each  time  a  resource 
request  is  to  be  granted.  Sever  waiting  for  a  lock  is  inefficient 
when  deadlock  is  a  rare  occurence.  Thus,  in  distributed  computing 
systems,  deadlock  detection  and  resolution  algorithms  •must  be  used. 
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Correctness  refers 


robustness,  5)  performance,  and  4'  practicality, 
to  the  ability  of  the  algorithm  to  detect  all  deadlocks,  and  the  abil¬ 
ity  to  not  detect  any  false  deadlocks.  Robustness  refers  to  the  abili¬ 
ty  of  the  algorithm  to  be  correct  even  in  the  presence  of  anticipated 
faults.  This  includes  the  ability  tc  detect  deadlocks  even  when  a 
site  fails  or  loses  communications  while  Jrhe  deadlock  detection  algo- 
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■used,  number  of  messages  required,  etc-  Practicality  is  closely  re¬ 
lated  to  performance.  It  refers  to  aspects  such  as  complexity  and 
cost. 

Several  different  approaches  are  being  ’used  in  current  deadlock 
detection  and  resolution  algorithms  for  distributed  systems.  Pro  ma^'or 
ones  are  centralized  and  distributed  deadlock  detection  algorithms. 
Viithin  the  distributed  class  are  two  subclasses;  1'  all  or  several 
sites  execute  the  deadlock  detection  algorithm,  and  2)  only  one  site 
is  actually  executing,  although  the  algorithm  is  resident  in  all  sites 
and  thus  any  site  could  execute  the  algorithm.  It  might  be  easier  to 
view  the  algorithms  as  a  continuum:  fully  central! zed[lRA",S ] , 
hierarch! cal[MEir79]»  distributed  with  a  single  site  at  a  time  execut¬ 
ing  the  algorithm^  1-CL77],  distributed  with  all  sites  involved  in  a 
possible  deadlock  executing  the  algorithm  concurrently^ HTt1?],  and 
distributed  with  all  sites  executing  the  algorithm 
concurrently^  ISIf'S ] . 

In  this  paper  we  investigate  the  robustness  of  several  published 
deadlock  detection  and  resolution  algorithms  for  distributed  systems. 
The  motivation  for  our  work  ocmes  from  three  facts,  first,  very  few 
authors  investigated  robustness  or  -“liability  of  deadlock  detection 


algorithms,  record,  reliable  deadlock  detection  and  resolution  for 
uoccmin^  new  distributed  systems  and  accli  cat  ions  is  in  our  ocinicn  an 
urgens,  very  important  and  as  yet  not  satisfactorily  resolved  problem. 
Third,  as  there  can  be  more  than  one  deadlock  being  detected  bv  the 
deadlock  detection  algorithm  then  it  is  reasonable  to  extent  such 
algorithm  to  be  robust,  i.e..  to  continue  executing  ar.d  detecting  all 

jco  *p  4-V'  0  *tj‘»0C5g»n0o  ~  ^”01  5  •  '  -Vo  '  -~\r*  ~  -  ►'  _i-“* _ 

-  sc1*  creaked  one  of  deadlocks  V)6in£r  defected* 

The  paper  is  organized  as  follows.  In  section  two ,  we  discuss 
robustness  of  distributed  systems.  In  section  three,  we  analyze  the 
robustness  of  several  existing  ieadlock  detection  algorithms  wi~h 
respect  to  some  single  failures.  In  section  four,  we  present  cur  con¬ 
clusions  based  on  the  analysis  of  section  3. 

II.  SCriS  THOUGHTS  ON  ROBUSTNESS  HI  DISTRIBUTED  SYSTH1S. 

In  this  paper  we  want  to  investigate  the  robustness  cf  deadlock 
detection  algorithms  (DDA),  i.e.,  we  want  to  find  cut  the  impact  cf 
some  single  failures  cn  such  algorithms.  In  general,  the  TDA  is  in¬ 
voked  by  two  events  -  either  whenever  a  preoess  waits  fer  a  resource, 
or  after  a  certain  period  of  time  has  elapsed  since  the  last  DDA  invo¬ 
cation.  In  the  first  esse,  deadlock  is  checked  for  whenever  its  cos- 
,  3nd  in  cr.s  30ccn.d  0350  '  3  ''dsnksd  ^d*  ^3.^  — 

1.7  i.e.,  regardless  cf  whether  its  possibility  exists  . 

The  TDA  can  reside  in  one,  several  or  all  sites  cf  the  distribut¬ 
ed  computing  system.  '.hen  a  triggering  event  fer  TTA  zocurs.  ~h.er. 
depending  :n  a  rarticular  algorithm  one,  several  rr  all  sites  '•rill 
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arcs  ox'  the  wait-for  graph,  strings,  or  lists  of  processes  or  transac¬ 
tions.  Iron  receint  of  such  information  one,  several  or  all  sites 
attempt  to  reconstruct  a  global  state  of  the  distributed  system,  i.e., 
to  generate  a  true  snapshot  of  all  or  of  all  waiting  processes  in  the 
system. 

The  generation  of  such  a  true  snapshot  in  the  distributed  system 
is  difficult  because  of  lack  of  global  memory  and  the  message  delays 
which  are  not  neglibible  and  can  vary  considerably.  The  generation  of 
such  a  trie  snapshot,  usually  referred  to  as  a  global  vait-for  graph, 
becomes  even  more  difficult  when  we  consider  a  possibility  of  failures 
in  the  distributed  system.  Some  system  mechanisms  have  been  designed 
to  be  robust  or  reliable.  ?or  example,  some  concurrency  control  or 
synchronisation  mechanisms  for  distributed  databases  and  transaction 
processing  systems  are  based  on  two  phase  locking,  which  has  been  made 
robust  by  incorporating  atomicity  by  'using  two  phase  commit  protocols. 
The  two  chase  commit  protocol  supports  not  only  the  atomicity  of  tran¬ 
sactions  but  also  it  supports  the  robustness  of  locking,  i.e.,  the 
robustness  of  concurrency  control  mechanisms.  In  particular  what 
makes  the  concurrency  control  which  uses  locking  robust  is  the  need  to 
lock  and  -unlock  resources  in  a  robust  way,  i.e.,  either  all 
icck/unlock  operations  for  a  given  process  or  transaction  occur  or 
none  occur.  Thus  in  some  sense,  the  robustness  of  concurrency  control 
is  meant  to  support  the  atomicity  of  placing  and  releasing  a  set  :f 
lock3  needed  b"  a  process.  In  other  words,  the  robustness  of  con¬ 
currency  control  means  that  no  dangling  locks  or  locked  resources  are 
left  behind  the  terminated  or  committed  process,  even  in  the  presence 
of  seme  failures.  If  is  interesting  to  note  that  although  deadlock 
detection  is  a  part  of  concurrency  control  based  on  locking,  -here  leas 


been  no  attempt  to  provide  for  or  even  to  investigate  the  robustness 
of  deadlock  detection  mechanisms .  Dhe  most  likely  exnlanatior.  for 
this  is  that  from  the  concurrency  control  point  of  view,  the  inability 
of  the  process  to  lock  a  needed  resource  is  an  exception  to  be  handled 
by  another  mechanism,  i.e.,  a  deadlock  detection  algorithm  IDA'. 

The  proper  way  to  see  the  DDA  is  as  another  transaction  manning 
'order  the  concurrency  control  mechanism,  as  it  reads  and  shares  Irek 

3CA  is  a  special  transaction  vnich  operates  on  special  iata  it  creates 
solely  for  deadlock  detection,  e.g. ,  wait-for  graphs.  Such  data, 
we'll  call  it  deadlock  data,  is  internal  to  each  invocation  of  IDA 


transaction  and  is  erased  after  its  execution.  Moreover,  such 
deadlock  data  is  not  shared  cy  any  other  DDA  transaction  invocations 
and  therefore  they  need  not  be  locked.  This  means  that  the  robustness 
required  of  DDA  transactions  is  of  a  somewhat  different  kind  than  the 
robustness  of  transactions  operating  on  shared  database  data.  Thus  it 
makes  sense  that  the  DDA  transaction  does  net  need  to  use  two  phase 
oomnit  to  assure  its  robustness.  The  question  then  is  what  kir  i  of 
robustness  or  fault-tolerance  we  need  for  DDA  -ransaefions  -and  this  is 
precisely  the  problem  we  are  addressing  in  this  paper. 

'Ve  consider  the  following  informal  model  of  DDA  transaction  exe¬ 
cution.  The  DDA  is  invoked  by  a  concurrency  oontroller  at  a  site  at 
which  a  database  transaction  can  not  acquire  locks  which  ere  veing 
held  by  another  transaction's  / .  The  DDA  transaction  executes  at  or.e, 
several  or  -all  sites  depending  on  the  TDA  itself  and  the  deadlock 
topology'.  During  its  execution  the  DDA  transaction  should  exhibit 
a  it z’zL 2 i t*  ~ rc r,?r"*’y »  i , p , ,  3Xf?ou.T3s  " r  *  ** 


messages  to  the  concurrency  controller  which  has  triggered  it: 

4 '  Proceed  -  because  of  a'1  no  deadlock 

o'  deadlock  detected  vut  another 
transaction  was  selected  as 
a  victim  for  back-up 

2'  Abort  -  because  of  a'  deadlock  defected  and  you  are 

b'  IDA  transaction  failed,  i.e., 
if  did  not  execute. 

Ihe  situation  we  investigate  in  this  paper  is  when  22A  transac¬ 
tions  fail  or  should  not  fail,  i.e.,  how  robust  the  existing  DDA’s  are 
or  should  be.  In  this  paper  we  consider  only  two  classes  of  single 
failures.  First,  we  investigate  the  impact  of  lost  messages  and 
second,  we  investigate  the  impact  of  one  site  failures,  or  identically 
one  site  partitions  on  DDA  behavior.  We  investigate  the  impact  of  lost 
messages  because  not  all  distributed  systems  may  support  reliable 
delivery  of  messages,  several  algorithms  treat  messages  as 
rescurcesi_ 2012""’],  and  in  some  applications,  acknowledgements  cannot  be 
sent . 

:::.  fzliafiiiiy  analysis  of  deadlock  descticji  alocetshs 

In  this  section,  we  examine  four  published  deadlock  defection 
algorithms  for  distributed  computing  systems  with  respect  to  the  pres¬ 
ence  of  the  “wo  classes  of  failures  .lost  messages  and  site  failures'- 
discussed  in  section  two.  Although  very  few  of  them  have  already  been 
shewn  to  be  correct  ‘when  no  failures  or  errors  occur,  we  feel  that 
their  robustness  is  nevertheless  worth  analyzing.  Ihe  assumptions 
made  by  each  author  will  be  discussed  in  the  context  of  how  robust  the 


rest¬ 


ores  -and  a  single  transaction.  ..These  restrictions  merely  rake  the 
example  simpler,  they  ’are  r.ct  required  for  the  analysis.  '  The  initial 
system  stares  is  one  on  in  figure  • .  Transaction  Ti  at  site  A  holds 
resources  H2  and  3.J  and  is  waiting  for  resource  H4-.  Transactions  T2 
•end  T?  hold  nc  resources.  Transaction  T4  at  site  T  holds  resource  P.4, 
"out  is  active.  V/e  assume  that  the  deadlock  detection  activity  result- 
-  v% ^  -2***^*v,  ~'4  2?  P.4-  233  r^sn  0^0^ ,  -c  trere  ts 

r.o  deadlock  detection  activity  in  the  system.  Tor  the  algorithms 


which  recuire  global  t ime stares ,  we  assign  timestsmr  ’TS'  ti  to  the 
TV' — H2  assignment,  t2  to  the  T4< — P4  assignment,  t?  to  the  T1  < — FT 
assignment,  and  t4  to  the  TI — >P.4  request.  Tow  at  some  time  1 6,  tran¬ 
saction  T4  requests  ?.;,  resulting  in  a  global  deadlock  TI — >T4 — >T1  . 


Figure  1 


In  the  case  of  a  site  failure,  we  distinguish  the  following  cases,  a' 
A  site  can  have  a  transaction  involved  in  a  deadlock  but  not  be  in- 
v~d.  : 2.  ^0  *  2.  ■*  •'"**•*  i m #  'o  3.  33n  ^3.v°  3  ir._ 

vclved  in  a  ieadlcck  -end  ~e  involved  in  detection,  c  a  site  can  have 
a  resource  involved  in  a  deadlock  and  not  be  involved  in  detection,  d' 
a  site  can  have  a  resource  involved  in  a  deadlock  and  be  involved  in  a 
detection,  or  e'  a  site  car.  be  involved  in  deadlock  detection  but  in 
co  vay  involved  in  a  deadlock.  Tot  all  of  these  possibilities  exist 

**  '■'  301'’  O  ~  T*  “  ^ # 


In  ’  Z-21T" ] ,  Icldman  presents  two  deadlock  detection,  algo¬ 
rithms.  Inly  she  distributed  version  will  be  considered  in  this  pa¬ 


cer.  A  Process  Management  Module  (PMM)  at  each  site  handles  resource 


allocation  and  deadlock  detection.  An  'ordered  blocked  process  list' 
CBPI'  is  a  list  of  process  names,  each  of  which  is  waiting  for  access 
~o  a  resource  assigned  to  the  preceeding  process  in  the  list.  The 
last  process  in  the  list  is  either  waiting  for  -access  to  the  resource 
named,  or  it  has  access  to  that  resource.  An  IEP1  is  created  each 


time  a  PMM  wants  to  see  if  a  blocked  process  is  involved  in  a 
deadlock.  In  the  distributed  algorithm,  an  OBPL  is  passed  from  a  PMM 
to  another  PM  which  has  information  either  about  a  resource  or  a 
transaction  in  the  CEP1  'which  is  needed  to  expand  the  0BP1.  Bach  PMM 
adds  the  information  it  knows,  and  either  detects  a  deadlock,  detects 


a  non-deadlocked  state,  or  passes  the  CBPI  to  another  ?M  for  further 
expansion.  The  terms  process  and  transaction  will  be  used  synonymous¬ 
ly  in  the  analysis  of  this  DDA.  If  several  transactions  are  waiting 
on  one  transaction,  multiple  copies  may  be  made  of  the  CEP1  and  sent 
to  each  site  having  one  of  those  '.waiting  transactions.  Processes  oar. 
be  in  either  of  2  states,  active  or  blocked  (waiting).  A  blocked  pro¬ 
cess  could  be  waiting  for  a  database  object,  message  text  from  another 
^recess  zt  "lesss-re  frcn  z cer3.rcr •  A  ^recess  is  30; r i ve  if* 
is  not  blocked.  In  the  aigoritnn,  ?X  and  EX  are  temporary  variables 
reoresenti.ne  a  orocsss  or  resource.  B'r.e  s~ets  of  the  algorithm  are: 


let  EX  to  the  value  contained  in  the  reso 
portion,  of  the  23PL.  If  EX  represents  a 
to  2.  Imaerwise.  re  to  E. 


>•-*0  *  i 

2C3J1  resource «  re 


■r;  o  ^  on*  —  ~ 


3"  — ^  ^  .m  »*•*-'  p«p 


p.  let  ?X  be  process  controlling  PX.  If  ?X  is  already  in  0BPL, 
then  there  is  a  deadlock.  If  not,  sc  to  1. 

1.  If  IX  is  local  to  current  rill,  gc  to  otherwise  so  to  ", 

o.  If  PX  is  active,  there  is  no  deadlock.  Discard  03F1  and 
halt.  Otherwise  go  to  6. 

•S.  Add  PX  to  IPF1  and  go  to  “C. 

".  Add  PX  and  IX  to  03PL.  Send  IEP1  to  I!'!!'  in  site  in  which  PX 
r*5si.  •  'is  1_  t  • 

^3*  ^  rrccsss  in  ^ 3,cc9ss 

net,  there  is  no  deadlock,  sc  discard  iPPI  and  halt.  If  sc, 
"to  ^  • 

?.  If  last  orccess  in  OEPL  is  active,  there  is  no  deadlock,  so 
discard  OBPI  and  halt.  Otherwise  go  to  "0. 

10.  Call  resource  for  which  last  process  is  waiting  PX.  If  HX  is 

local,  go  to  J.  Otherwise  go  to  4i  . 

1 1 .  Place  PX  in  0SP1  and  send  OBPI.  to  ?M?  of  site  in  which  PX 
resides.  Halt. 

ligure  2  shows  the  actions  taken  at  each  site  during  the 
execution  cf  the  IDA  following  the  request  by  II  for  resource  P? .  The 
numbers  refer  to  the  current  step  being  executed  hp  the  IDA.  As  can 
be  seen,  the  algorithm  correctly  detected  the  resulting  deadlock,  in 
an  environment  of  no  faults.  If,  however,  a  message  is  lost  'in  our 
example,  either  the  OBPI  sent  from  site  0  to  A,  or  the  OBPI  sent  from 
A  to  O',  the  necessary  information  to  detect  the  deadlock  will  be 
lest,  ar.d  the  algorithm  will  fail  to  detect  an  existing  deadlock. 


Site  A 


Site  D 


Site  S 

10.  Create  D2PL  with 
14.  Set  ?II  =  ?o 
3-  T1  oontrols  P.3, 

01  not  in  CBPL. 

4.  01  not  local 
7.  Add  01  and  S3  to 
CBPL  and  send 
to  site  A. 

1  .  Set  RX  =  S3- 
?.  01  has  access  to  P.3* 

' .  01  waiting. 

10.  Set  Pit  =  S4 . 

11.  Add  R4  to  03PL, 

send  to  site  D. 

i  .  Set  SX=R4 ■ 

2.  01  waiting  for 

P.4. 

3.  Set  PX=04 .  04 
already  in  CBPL, 
deadlock  detected. 

Figure  2 


Goldman's  algorithm  allows  the  following  types  of  sites  dis¬ 
cussed  previously:  type  b  (a  site  can  have  a  transaction  involved  in 
deadlock  and  the  site  is  involved  in  detection./,  type  d  'a  site  can 
have  a  resource  held  by  a  transaction  involved  in  deadlock  and  the 
site  'will  be  involved  in  deadlock  detection),  and  type  c  ' a  site  can 
have  a  resource  held  by  a  transaction  involved  in  a  deadlock  and  not 
be  involved  in  deadlock  detection) .  A  site  could  also  be  in  several 
of  the  categories  above,  depending  on  the  complexity  of  the  system 
state.  Per  example,  site  D  could  be  considered  a  type  b  or  type  d 
site.  If  a  sice  of  type  b  'sites  A  or  I  in  tut  example)  fails  during 
execution  of  the  PDA,  the  behavior  could  be  different  depending  on  the 
time  of  the  failure.  If  the  failure  occured  at  site  A  before  site  Z 
3ent  the  1BP1  to  3ite  A,  site  Z  would  realise  that  site  A  had  failed. 
The  algorithm  includes  no  procedure  for  this  occurence,  so  the 
behavior  would  be  dependent  on  the  'underlying  system.  If  the  failure 
at  sice  'coured  'after  it  received  the  22PL.  all  deadlock  detection 


involved 


■activity  will  cease,  because  only  site  A  was  currently 
deadlock  detection.  A  system  timeout  mechanism  would  eventually  abcr* 
the  transactions  involved  in  the  lead leek .  A  failure  at  site  1  would 
have  the  same  effect  as  at  site  A. 

If  a  site  of  type  i  'site  1  in  our  example)  failed,  the  fine 
of  the  failure  would  again  determine  the  behavior  of  the  IDA.  If  the 
failure  secured  before  site  I  sent  the  12P1  tc  site  A,  deadlock  detec¬ 
tion  activity  wo’uld  cease  without  deadlock  having  been  detected.  If 
the  D3?L  had  been  sent,  however,  deadlock  ietection  would  continue  at 
sites  A  and  D  (sequentially)  ’with  site  D  detecting  a  deadlock.  The 
failure  of  site  C  would  not  have  been  critical  after  the  D3P1  had  been 
sent.  The  effect  of  a  type  c  site  (site  3  in  our  exancle '  failing 
would  have  no  effect  on  the  behavior  of  the  DDA,  because  the  fact  that 
R2  is  held  by  31  is  not  ’used  or  known  by  the  DCA  at  any  site. 

There  are  essentially  two  types  of  I3FL's  created  hv  this 
DDA.  The  first  type,  call  it  '.v,  is  when  a  process  is  waiting,  but  is 
not  involved  in  a  deadlock.  This  03P1  is  subsequently  discarded.  The 
second  type,  call  it  D,  is  one  which  will  eventually  show  a  deadlock 
cycle.  If  there  are  n  transactions  involved  in  a  deadlock  cycle,  this 
DDA  ’will  create  from  1  to  n  type  D  OBPL's.  In  our  example,  only  one 
was  created.  If  the  request  by  T1  for  resource  ?.4  hapened  simultane¬ 
ously  with  the  request  by  T4  for  resource  ?.T,  owe  DBPI's  would  have 
been  created  which  would  have  resulted  in  two  sites  independently 
detecting  the  same  deadlock,  vice  the  one  site  in  our  example.  Thus 
the  robustness  of  this  algorithm  with  rssvect  to  a  single  site  failure 
is  related  to  the  ratio  of  the  number  of  D  type  DBPI's  created  to  the 
number  of  transactions  involved  in  the  deadlock.  This  ratio  is  hewe-’- 
er  determined  by  trie  sequencing  or  timing  of  transactions  messages 


clocked  resources 


Such  sequencing  is  of  random  nature 


A  ratio  of  i 


'•/cull  provide  the  highest  degree  of  robustness.  '  ’Vhen  only  a  single 
I2P1  is  created,  the  robustness  of  the  DBA  is  very  similar  to  that  of 
a  centralized  DDA;  a  3ingle  site  failure  can  stop  deadlock  detection 
activity.  We  conclude  that  the  robustness  of  this  SDA  oan  be  analyzed 
but  it  oan  not  be  oredicted. 


In  ‘/Hr’?],  ’'erases  ar.d  Vurnz  presented  a  distributed 
deadlock  detection  algorithm.  Jlinor  and  Jhattuck  ' JLI3C 1  presented  a 


counter  example  which  shewed  the  algorithm  to  be  incorrect  in  that  it 
failed  in  some  oases  to  detect  a  deadlock.  They  also  proposed  a 
modification  to  the  algorithm  which  they  thought  would  make  it 
correct,  but  they  felt  the  algorithm  was  impractical.  In  [ TSAS2 ] , 
Tsai  and  Bedford  show  that  the  algorithm  as  modified  by  Jligcr  and 
ohattuck  is  also  incorrect.  Nevertheless,  we  will  investigate  the 
enhanced  algorithm  'i.e.,  its  modified  version  as  suggested  by  Jligcr 
and  ohattuck)  in  the  presence  of  errors. 

The  algorithm  constructs  a  Transact i on-wai t s-Fo r  '7 /TFA  graph 


at  originating  sites  of  transactions  which  are  potentially  involved  in 
the  deadlock  being  detected,  and  at  sites  at  which  some  transaction 
could  not  acquire  a  resource.  Modes  in  the  WF  graphs  represent  tran¬ 


sactions.  .An  edge  )Ti,T.y'  indicates  that  transaction  Ti  is  waiting 
for  transaction  T^.  A  non-blocked  transaction  is  a  transaction  that 
i3  not  waiting  and  is  represented  in  the  TVF  graph  by  a  node  with  no 
outgoing  arcs.  A  blocked  transaction  is  waiting  for  some  transaction 
to  finish.  A  'Blocking  set'  is  defined  as  the  set  of  all  non-blocked 
transactions  which  can  be  reached  by  following  a  directed  path  in  “he 
TVF  Tram  smarting  at  the  node  associated  wizh  transaction  T'  'Hr""1  . 


1 


mai: 


-  *  - 


31  . 

that 
of  1 

jV>  o 

are: 


'blocking  pair'  of  T  if  3'  is  in  the  blocking  se4: 
A  ’Potential  Blocking  set'  consists  of  all  waiting  transactions 

Soria(  3 '  oieans  the  site  of  origin 


m  ^  r-s/m 


ransaction  f.  3k  is  the  site  currently  executing  the  algorithm, 
riles  which  iefine  the  enhanced  algorithm,  as  executed  as  site  3k, 


.v.nen  a  transaction 
marked  'waiting'  . 


^  y>  ~  O  **  cS  O  ' 


1 :  The  resource  P  at  site  3k  cannot  be  allocated  to  tran¬ 
saction  1  because  it  is  held  by  31 . 3k.  Add  an.  arc  from 

3  to  each  of  the  transactions  31,...,3k.  If  there  is  then  a 
cycle  formed  in  the  7JF  graph,  deadlock  has  been  detected. 
Otherwise,  for  each  transaction  3'  in  blocking  set  3',  send 
the  blocking  pair  '3,3' '  to  Sorig'.I'  if  Icrig'3'  */=  3k  and 


*»- 

to  Sorigil’  ■  if  3origt3''  =  =  3k. 
blocking  pairs  associated  with  3. 


.■on  a 


Xts 


Rule  2:  A  blocking  pair  '3,3*'  is  received.  Add  an  arc  from  3 

If  a  cvcle  ic  forrr.sd*  Vnen  i 


no  T *  in  the  IWF  ^rach 
deadlock  exists. 


Pule  2.1  :  If  3*  is  blocked  ar.d  Icrig1' 3'  =•'=  3k,  then  for  each 
transaction  3”  in  the  blocking  set;'!',  send  the  blocking 
pair  .3,3"'  to  lorig,!"'  if  3crig;3"  =  =  3k. 

Pule  2.2:  If  3  is  waiting  and  Sorig'.'T'  =  3k,  then  for  each  pc- 
tential  blocking  pair  ' 3" , 3 '  send  the  blocking  pair  '3" ,3’ 
to  3orig;3"  if  3crig'.3"l  =/=  3k.  3hen,  discard  the  poten¬ 
tial  blocking  pairs  (3”, 31  and  erase  the  'waiting'  mark  of 


J 


figure  3  shows  the  actions  taken  at  each  site  during  the  execu¬ 
tion  of  the  PDA  following  the  request  by  34  for  resource  P.3 .  As  oar. 

3it6  At  in  absence  rf 


rai-ures. 


the  request  message  .34,13  from  site  3  to  site 


was 


lost,  however,  deadlock  detection  activity  would  oease.  If  the  block¬ 


ing  pair  — 
iehec^  nhe  ie°il” 

■*  — ^  a  j  ’go  ? 


site  3  'was  lost,  site  A  would  still 
however,  the  clocking  near  34,3'  '  from,  site 

*  5  •y  ^  ^  '  ~*~  m  /->  j  **  q  « 

^ ^n  acTivi^r:  vc*ili  ?ease« 


>  •  -  -  -  -  vr 


> 


Site  A 
SI  — >  S4 


*ry^ 


(S4 ,R3  received} 
1  . 

T4  — >  SI 


0. 


Site  D 
S4 

(S4  requests  RJ' 
Si  narked  waiting 


Slocking  set(S)  = 

| 

Send  ( S4,Sd '  to  D 
and  A. 

Potential  Slocking 
pairs  =  nil. 

PI, SI  received.  , Si,S1 '  received. 

SI  — >  S4  ?4  — >  SI 

1 _ i 

Deadlock  Detected. 


Figure  3* 


Shis  algorithm  allows  sites  of  types  b,  o,  d  and  e,  although 
our  example  does  not  include  a  site  of  type  e.  If  a  type  b  site  (one 
having  a  transaction  involved  in  the  deadlock  and  the  site  is  also 
involved  in  detection’;  failed,  in  our  example  site  A  (or  3ite  D\  the 
behavior  of  the  algorithm  is  dependent  on  the  time  of  failure.  If 
site  A  failed  before  receiving  the  blocking  pair  (S4,S1),  site  D  would 
recognize  the  failure,  but  its  action  is  not  specified  in  the  rules  of 
the  DDA.  Site  D  would  not  detect  the  deadlock  for  the  3ame  reson  as 
if  the  message  from  site  C  to  site  A  was  lost.  If,  however,  the 


failure  secured  after  si*e  A  received  “he  blocking  pair,  deadlock 
detection  activity  would  continue  at  site  D]  but  deadlock  would  not 
be  detected.  A  failure  of  3ite  D,  also  a  type  b  site,  at  any  time, 
would  have  no  effect  on  detecting  the  deadlock  in  thi3  example.  If  a 


type  o  site  failed  (site  3},  it  would  have  no 
deadlock.  If  a  type  i  site  ’site  o';  failed, 
would  determine  the  behavior  of  the  DDA.  If  i 


effect  on  detecting  the 
the  time  of  its  failure 
t  failed  before  sending 


the  clocking  pair  me  sites  A  and  G,  deadlock  detection  activity  would 
cease.  If  it  failed  after  sending  those  messages,  it  would  have  no 


effect  on  detecting  the  deadlock. 

for  our  example,  this  algorithm  behaved  surprisingly  simi¬ 
larly  to  Goldman's  algorithm  in  almost  all  types  and  timings  of 
failures.  This  may  just  be  an  anomaly  found  in  small  deadlock  cycles, 
because  in  longer  and  more  complex  scenarios,  if  would  appear  “hat 
mere  sites  would  be  involved  in  detection,  and  that  there  would  be 
some  duplication  of  information.  As  the  number  of  transactions  'ana 
resources)  involved  in  a  deadlock  cycle  increases,  more  blocking  pairs 
ar.d  potential  blocking  pairs  will  be  sent  to  more  sites,  i.e.t  the 
number  of  sites  detecting  the  deadlock  is  increasing  'with  the  number 
of  transactions  involved  in  the  deadlock  and  "with  the  deadlock  topolo¬ 
gy  'or  complexity).  Thus  there  will  be  more  chance  of  a  deadlock 
being  detected,  as  more  parallel  detection  activity  '.-rill  be  in  pro¬ 
gress.  It  appears,  then,  that  as  the  site  and  complexity  of  deadlock 
increases,  the  robustness  of  this  algorithm  increases.  However,  as 
pointed  out  by  Hi cor  and  Shattuck,  the  effect  which  Gligor  and  Shat- 
tuck  point  out  of  mule  2.2  discarding  information  too  early'  may  have 
seme  impact  on  the  increased  robustness. 

1.  IBERMARCK’S  DISTRIBUTED  DEADLOCK  DETECTION  ALGORITHM. 


In  [CEESO],  Gbermarck  presents  a  distributed  deadlock  detec¬ 
tion  algorithm.  A  centralized  algorithm  is  presented  by  Gbermarck  and 
feeri  in  [31381 ],  but  it  is  not  discussed  here  because  no  mention  is 
made  in  that  paper  about  a  backup  capability  if  the  site  containing 
the  centralized  deadlock  detector  fails.  Gbermarck 's  distributed 
algorithm  constructs  a  fransacticn-waits-for  'TV?'  graph  at  each  site. 


-uacr. 


site  ocr.cucts 


deadlock  detection  simultaneously ,  passing 


information  to  one  other  site,  deadlock  ietection  activity  at  a  site 
may  become  *enporarily  inactive  until  receipt  of  new  information  from 
another  site,  Tbermarck  states  that  in  actual  practice,  synchroniza¬ 
tion  (not  necessarily  precise)  between  sites  would  be  roughly  con¬ 
trolled  by  an  agreed-upon  interval  between  deadlock  ietection  itera¬ 
tions,  and  by  timestamps  on  transmitted  messages.  Cedes  in  the  graph 
represent  ,tr‘ins:ric"vi.cnsf  r9,cr,3S6ryt  3. 

transaction  '7VFT)  situation.  A  '  It  ring'  is  a  list  of  7v7T  informa¬ 
tion  which  is  sent  from  one  site  to  one  or  more  sites.  A  transaction 
may  migrate  from  site  to  site,  in  which  case  an  'agent'  represents  the 
transaction  at  the  new  site(s).  A  communication  link  is  also  esta¬ 
blished  between  agents  of  a  transaction.  These  communication  links 
are  represented  by  a  node  called  'External.'  An  agent  which  is  expect¬ 
ed  to  send  a  message  is  shown  in  the  WF  graph  by  EX — >T,  while  an 
agent  waiting  to  receive  is  shown  by  T — >ZX.  Although  Tbermarck 's 
algorithm  includes  the  resolution  of  deadlocks,  only  the  detection 
part  will  be  considered  in  this  paper.  Transaction  IT's  are  network 
unique  names  for  transactions,  and  are  lexically  ordered.  (For  exam¬ 
ple,  T1  <  T2  <  T3).  The  steps  performed  at  each  site  are: 

1 .  3uild  a  7rfF  graph  ’using  transaction  to  transaction  wait-for 

relationships. 

2.  Obtain  and  add  to  the  existing  TWF  graph  any  'strings' 

transmitted  from  other  sites. 

a.  Tor  each  transaction  identified  in  a  string,  oreate  a 
node  in  the  7.VF  if  none  exists  in  this  site. 

b.  For  each  transaction  in  the  string,  starting  with  the 
first  (which  is  always  'external'',  create  an  edge  to  the 
node  representing  the  next  transaction  in  the  string. 

3.  Create  -wait-for  edges  from  'external'  to  each  node  represent¬ 

ing  a  transaction's  agent  which  is  expected  tc  send  on  a 
communication  link. 

Create  a  ’vF  edge  from  each  node  representing  a  transaction's 
oawe  4 3 


1 


agent  which  is  ’.waiting  to  receive  from  a  communication  link, 
to  ' external . ' 

r .  Analyze  the  graph  for  cycles. 

6.  .After  resolving  all  cycles  not  involving  'external',  if  the 
transaction  ID  of  the  node  for  which  'external'  waits  is 
greater  than  the  Transaction  ID  of  the  node  waiting  for 
’ external ' ,  then 

a.  Transfora  the  cycle  into  a  string  which  starts  with 
'external',  followed  by  each  transaction  ID  in  the  cycle, 
end ins  with  the  transaction  ID  of  the  node  waiting  for 
'external' . 


b.  Send  the  string  to  each  site  for  which  the  transaction 
terminating  the  string  is  waiting  to  receive. 


In  his  proof  of  correctness,  Oberaarck  shows  how  the  algo¬ 
rithm  can  detect  false  deadlocks  because  a  string  received  at  a  site 
may  no  longer  be  valid  when  it  is  ’used.  He  discusses  two  methods  of 
handling  false  deadlocks;  treat  them  as  actual  deadlocks (if  they  don't 
occur  too  often) ,  or  verify  them  by  sending  then  around  the  network 
and  have  each  site  verify  them. 
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Figure  4 


Figure  4  3hows  a  global  picture  of  the  system,  including  the 
oomsunioation  links  established  between  -agents,  for  the  initial  rendi¬ 
tions  of  our  example.  The  agents  of  TI  at  sites  2  and  I  ' 


nave 


cease  without  detecting  the  deadlock.  The  use  cf  agents  tc  represent 
transactions  which  have  titrated  tc  other  sites  allow  this  HA  to  have 
nodes  of  types  a  or  b,  if  we  substitute  'agents'  for  'transactions’  in¬ 
cur  definitions  at  the  beginning  of  this  section.  Site  3  would  be  an 
exacnl e  of  a  type  a  site,  while  the  other  three  sites  would  all  be 
tyre  b  sites. 


effect  on  the  behavior  of 
the  HA.  A  failure  at  sites  A,  3  or  1  would  either  have  no  effect,  -an 
•y.ietermined  effect,  or  cause  deadlock  detection  activity  to  cease, 
depending  on  the  tine  of  the  failure.  For  example,  if  site  2  failed 
before  sending  the  string  'ZI,Tl,?l!  to  site  A,  deadlock  detection 
activity  would  cease.  If  site  A  (or  D)  failed  before  the  string 
'ZC,T4.,T1)  was  sent  to  them,  the  transmitting  site  would  recognise  the 
failure,  but  its  action  in  that  eventuality  is  not  included  in  the 
steps  of  the  TDA.  If  site  2  failed  after  sending  the  string,  the 
detection  activity  would  continue,  ana  the  deadlock  would  be  detected. 

This  DDA  appears  to  be  potentially  more  robust  than  the  pre¬ 
vious  two.  3ach  site  contains  and  retains  more  information  in  its  '.<? 
graph,  and  all  sites  start  detection  activity  simultaneously,  and 
potentially  stay  involved  for  the  entire  detection  process.  The  use 
cf  the  lexical  ordering  cf  nodes  'was  for  optimization  of  the  number  of 
messages  transmitted.  If  this  constraint  were  lifted,  the  strings 
would  be  sent  to  all  sites  involved  from  all  sites  in  which  a  cycle 
existed.  In  our  example,  this  would  have  allowed  sites  A  and  D  to 
simultaneously  detect  deadlock.  The  IDA  would  be  clear ly  mere  robust, 
but  the  overhead  would  be  greater.  In  its  existing  form,  this  IDA'S 
robustness  is  similar  to  “he  trevious  algorithms  because  it  is  essen- 


2.  THE  ALGORITHM  CP  TSAI  A2JB  32LPCRD. 

In  [ TSA82 ] ,  Tsai  and  Telford  present  a  distributed  deadlock 
detection  algorithm.  They  utilize  a  "Reduced  Transaction-Resource" 
(RTR)  graph,  which  contains  only  a  subset  of  the  transaction  resource 
graph,  but  has  all  relevent  TV?  edges,  "odes  in  the  RTR  graph  can  be 
transactions  or  resources.  The  algorithm  -ises  a  concept  the  authors 
cell  a  "reaching  nair",  which  is  the  basic  'jr.it  of  information  rassed 
from  site  to  site.  If  a  path  TiTj  —  Tn  can  be  formed  by  following  TV? 
edges,  and  if  there  is  a  request  edge  (Tn,Rm),  then  Ti  "reaches"  Ra, 
and  (Ti,Rm)  is  a  "reaching  pair."  Pive  types  of  messages  are  sent 
between  sites:  reaching  messages,  nonlocal  request  messages,  alloca¬ 
tion  messages,  release-request  messages,  and  releasing  messages.  The 
non-local  request  messages  include  a  list  of  all  resources  currently 
held  by  the  requesting  transaction.  Pive  different  types  of  edges  are 
iistinquished  in  the  RTR  graph:  requesting  edges,  allocation  edges, 
TWF  edges,  resource  reaching  edges  and  transaction  reaching  edges.  A 
global  timestamp  is  also  used  to  establish  an  ordering  of  events. 
This  timestamp  is  used  on  allocation,  request  and  reaching  messages, 
and  on  allocation  and  reaching  edges  in  one  RTR  graph.  The  notation 
used  in  the  algorithm  is: 

TS(M) :  timestamp  of  a  message 
TS(C):  current  system  time 
TSfA):  timestamp  of  an  -allocation  edge 
TS(R):  timestamp  of  a  reaching  edge 
=/=:  not  equkl  to 

Sorig:  Site  of  origin 


The  steps  of  the  algorithm  'as  executed  at  site  3k)  are: 

Step  1 :  {A  transaction  T  enters  the  system  requesting  a  nonlocal 
resource  R*  Add  request  edge  'T,?f  to  RTR*  graph."  Send  re- 
roes  t  message  'T,?.' ,R,TS':  to  Sorig1.?.',  where  R’  is  the  set 
of  all  resources  allocated  to  T,  and  T3'"'  =  TSfS'.  ?.'  r.as 
each  TS'A;  attached,  and  7.’  is  empty  if  T  holds  no 


Step  la:  (A  transaction  2  releases  a  nonlocal  resource  R?  Zrase  edge 
(R,3'  in  the  R2R  graph.  Send  a  release-request  message (R, 2' 
to  3crig(R'. 

Step  2:  i A  transaction  2  enters  system  requesting  local  resource  ?J 
So  to  step  1. 

Step  2a:  <A  transaction  1  releases  a  local  resource  R!  Zrase  edge'?.,!' 

in  RTF.  graph.  If  there  is  any  transaction  2'  waiting  for  ?, 
then  begin 

Add  allocation  edge  (?.,!’'>  to  R2E  graph  with  2S(A'  = 
ISM'.  Send  allocation  message  IS"1  with  22  7!  = 

IS  (S'  to  3orig(2' '  if  SorigM';  =/=  5c.  end. 

Step  J:  !a  request  message  (2,R',R,2S)  is  received!  Add  allocation 
edges  (Hi, 2'  for  each  Hi  in  ?'  to  RTF  graph.  Go  to  step  1. 

Step  3a:  (A  release-request  message  (R,2')  is  received}  Zrase  alloca¬ 
tion  edge  (R,2)  in  R2R  graph.  Send  releasing  message  {?.,!' 
to  3orig(2).  If  there  is  any  transaction  2'  waiting  for  R, 
then  begin 

Add  allocation  edge  (?,!’)  to  RTR  graph  with  2S(A'  = 

2S(C).  Send  allocation  message  (R,2',2S)  to  Sorig(2’} 
if  2crig(2')  =/=  Sk.  end. 

If  R  is  not  held  by  any  transaction,  then  begin 

Add  allocation  edge  (R,2>  -with  2S(A)=2S(C)  to  R2R 
graph.  If  Sorig(T)  =/=  3k,  then  send  an  allcation  mes¬ 
sage  (R,2,2S)  with  2S(M>*2C(C)  to  Sorig(I).  end. 
else  begin 

Add  requesting  edge  (2,R)  to  RTR  graph.  Suppose  R  is 
held  by  transaction  2'.  Add  edge  (2,2')  to  R2R  graph. 
If  there  is  a  cycle,  deadlock  has  been  detected,  else 
go  uO  step  p.  end* 

(reaching  message  generation  step!  If  there  are  two  edges 
(2,R)  and  (2,2'}  added  to  the  graph,  and  if  22'... 2"  is  any 
path  obtained  by  following  the  2WF  and  transaction  reaching 
edges,  then  set  X  =R”  if  2"  has  outgoing  edge  to  R",  else 
set  X  =  R.  For  all  transaction  21  in  R2R  graph  reaching  X 
via  2,  do  begin 

If  2i  holds  any  resource  ?.'  with  Sorig{2i;  =/= 
Scrig'.R' '  and  Sorig(R’'  =  /=  3k,  then  send  a  reaching 
message  .2i,X,2S)  to  3orig(R' ' .  If  3crig(2i'  =/=  3k 
and  2i  =/=  2,  then  send  a  reacr.ina  message  '2i,X,22  ■  to 
Sorig(2i ) .  If  2orig(2i)  =/=  Sk  and  2i  ='l  and  X  =  R" 
then  send  a  reaching  message  '2i,X,2S'  to  Sorigvli'. 
2he  23  in  the  reaching  message  is  set  to  23 fC'  if  trig¬ 
gered  by  a  local  request,  and  set  to  23(1!)  of  the  non¬ 
local  request  or  reaching  message  otherwise. 

'An  allocation  message  fR,2,2S'  i3  received1  If  R  is  an  entry- 
in  the  graph,  then  begin 

Zrase  allocation  edge  R,2’ '  and  all  reachir.c  edges 
2",R)  with  23 1 ?.'  <  23 and  the  oorresponding  2V.7 
1  ^  rr.  ra3.#*'r.ir.2  ( ^ 1 


Step  4: 


Step  5: 


cteo  o: 


they  exist,  where  T'  =/=  T.  Change  reouestirur  edge 
(?,R)  to  allocation  edge  (R,T'  with  TS(A)  =  TS(CT)  if 
(T,R)  exists,  and  for  each  resource  reaching  edge 
(T",R),  add  the  transaction  reaching  edge  (T",T).  If 
Sorig(T)  =  3k,  wake  up  transaction  T.  end. 

Step  6a:  {A  releasing  message  (R,T)  is  received!  If  Sorig(T)  =  Sk, 
wake  up  transaction  I. 

Step  7:  {A  reaching  message  (T,R,T S)  is  received!  If  there  exists  an 
allocation  edge  fR,T')  in  the  graph  with  TS(M)  <  TSfA''  and 
T’  =/=  1,  then  skip  this  step,  else  begin 

Add  resource  reaching  edge  (T,R'  to  the  RTF.  graph.  If 
R  is  held  by  transaction  I',  then  add  the  transaction 
reaching  edge  (T,T' )  to  the  graph.  If  there  is  a  cycle 
in  the  graph,  there  is  deadlock  (go  to  step  other¬ 
wise  go  to  step  5-  end. 

Step  S:  fa  deadlock  has  been  detected!  Take  appropriate  action. 


Figure  6  shows  the  starting  VF  graphs  and  the  actions  of  the 
DDA  resulting  from  the  request  by  transaction  T4  for  resource  33.  An 
important  item  to  note  is  that  as  soon  the  request  is  made,  step  1 
adds  sufficient  information  to  the  WF  graph  to  detect  a  deadlock,  but 
does  not  check  for  deadlock,  so  the  request  is  sent  to  site  C  and  the 
algorithm  continues.  The  obvious  thing  to  do  would  be  to  add  a  check 
for  a  deadlock  cycle  in  step  one,  but  on  closer  analysis,  this  check 
may  lead  to  detection  of  false  deadlocks  (if,  for  example,  T1  had  just 
released  33  but  the  message  had  not  yet  been  received  by  site  D.  ■ 
Therefore  the  algorithm  in  it3  present  form  -will  be  analyzed.  The 
only  message  sent  by  this  algorithm  in  this  example  is  the  request 
message  (T4,  jR4i  ,?3,~S',  •  If  it  was  lost,  the  current  algorithm  would 
cease  detection  activity  without  detecting  deadlock.  In  this  in¬ 
stance,  if  the  algorithm  checked  for  deadlock  in  step  i ,  it  would  have 
been  detected  'with  no  messages  required. 


Si^e  A  Site  3 

Site  C 

Site  D 

T1  <— R2  T<  <— R2 

TK— R3 

T4< — R4 

:  t - R3  e — >R4 

-* — >R4 

=U 

- - >R4 

!t-a 

- -  35 

'T4  reouests  R3) 
add'.' T4,R3' 
send  T4,  '?.4‘ 

"*”0  . 

— :i<— R4 

;  :U 
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3:  add  (R4,T4) 

4:  add  (T4,R3^ 
add  >'T4,T1  i 
DEADLOCK  DETECTED 

?i  <— .R3<-4- 

i  : 

-I — >R4->T4 
Figure  5 


For  this  DDA,  sites  can  he  of  type  b,  i  or  e.  Rites  A  and  D 
are  type  b  and  sites  3  and  0  are  tyre  d.  This  example  has  no  type  e 
sites,  but  step  5  of  the  algorithm  could  send  reaching  messages  to 
sites  not  involved  at  all.  Those  sites  would  execute  a  step  or  two  of 
the  algorithm,  but  not  be  intimately  involved  in  the  actual  deadlock 
detection.  In  this  example,  a  failure  of  sites  A  or  3  'types  b  and  i 
respectively)  would  have  no  effect  on  the  detection  of  the  deadlock. 
The  effect  of  a  failure  of  site  Z  before  the  reaching  message  was  sent 
to  it  cannot  be  determined  because  the  DDA  includes  no  instructions 
for  -hat  event.  A  failure  of  3ite  Z  after  receiving  the  reacr.irr  mes¬ 
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't  in  a  oe3sation  of  detection  activity. 
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site  C  at  any  time  would  have  no  effect  on  deadlock  detection.  The 
timing  of  the  failure  would  also  determine  the  behavior  of  the  IDA  if 
site  D  failed.  If  site  0  failed  before  sending  the  request  message* 
detection  activity  would  cease,  while  if  the  message  had  been  sent, 
deadlock  would  still  be  detected. 

Tor  our  example,  this  DDA  appears  to  be  about  the  same  level  of 
robustness  as  the  other  algorithms,  except  that  each  site  contains  and 
retains  more  information  than  in  other  DDA's.  This  indicates  that  it 
should  be  more  robust.  The  algorithm  in  the  case  of  our  example  was 
able  to  detect  the  deadlock  with  only  the  resource  request  message. 
As  deadlock  cycles  become  more  complex,  it  appears  that  this  algorithm 
will  also  become  more  robust,  even  more  so  than  Obermarck's,  because 
this  DDA  retains  more  information,  and  it  will  send  reaching  messages 
to  any  site  potentially  involved  in  the  deadlock.  Detection  activity 
will  occur  simultaneously  in  those  sites  receiving  reaching  messages. 
The  impact  of  the  inclusion  of  a  cycle  detection  in  step  1  may  have 
adverse  effects  on  the  correctness,  but  it  might  greatly  enhance  the 
robustness  of  the  DDA. 

17.  CONCLUSIONS 

The  algorithms  discussed  in  the  previous  section  can  be  loosely 
ranked  by  their  robustness.  Tollman's  algorithm  is  the  least  robust, 
because  it  is  always  executed  sequentially  (unless  the  requests  occur 
simultaneously,  as  discussed  previously).  Thus  it  is  always  dependent 
on  a  single  node.  Obermarck's  algorithm  starts  deadlock  detection 
simultaneously  at  all  sites,  and  subsequently  passes  information  in  a 
lexical  manner  because  of  the  message  optimization.  For  our  example, 
this  resulted  in  a  sequential  detection,  although  for  larger  deadlock 
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cycles,  it  should  have  cone  parallel  detection  activity  occur ing.  The 
Penasce— Muntz  algorithm  starts  detection  at  the  site  where  the 
deadlock  secured,  and  deadlock  detection  is  subsequently  conducted  at 
sites  which  are  potentially  involved.  The  T3ai-Belford  algorithm  is 
invoked  each  tine  a  resource  is  requested.  Deadlock  detection  can 
appear  concurrently  at  all  sites  potentially  involved  in  the  cycle. 
It  arrears  cere  robust  than  the  Merasoe— Muntz  algorithm  because  cere 
information  is  held  at  each  site. 

Dur  analysis  supports  the  rather  obvious  conclusion  that  robust¬ 
ness  is  inversely  related  to  it's  cost.  The  Tsai-Belford  algorithm 
appears  core  robust  than  Obernarck's  algorithm,  for  example,  but  it 
maintains  larger  '.V?  graphs  at  each  site,  and  is  invoked  each  time  a 
resource  is  requested,  in  order  that  the  VF  graphs  contain  sufficient 
information. 

Por  the  example  we  used  to  analyze  the  four  algorithms  in  section 
3,  the  behavior  of  each  of  those  algorithms  in  the  presence  of  errors 
is  almort  identical.  Because  our  deadlock  cycle  only  involved  2  tran¬ 
sactions,  those  algorithms  which  are  potentially  more  robust  in  the 
presence  of  larger  cycles  did  not  have  time  to  develop  their  robust¬ 
ness.  In  other  words,  for  a  3hort  deadlock  cycle,  all  the  algorithms 
converged  within  approximately  the  same  length  of  time  (two  or  three 
iterations.)  Short  cycles  of  length  2  or  3  are  more  probable  in  exist¬ 
ing  applications,  so  all  the  above  algorithms  are  approximately  equal¬ 
ly  robust  in  current  applications.  In  future  applications  (informa¬ 
tion  utility  programs,  for  example',  however,  we  expect  a  much  higher 
probability  of  more  complex  deadlock  cycles,  which  will  require  a  mere 
robust  IDA.  Conversely,  however,  as  the  number  of  transactions  (and 
sites  increases,  it  will  be  important  to  use  a  minimum  cost  DDA. 


Vork  is  currently  in  progress  on  a  new  robust  distributed  deadlock 
detection  algorithm. 
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